165 research outputs found

    Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression

    Full text link
    Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for linear algebra problems. We show that, given a matrix ARn×dA \in \R^{n \times d} with ndn \gg d and a p[1,2)p \in [1, 2), with a constant probability, we can construct a low-distortion embedding matrix \Pi \in \R^{O(\poly(d)) \times n} that embeds \A_p, the p\ell_p subspace spanned by AA's columns, into (\R^{O(\poly(d))}, \| \cdot \|_p); the distortion of our embeddings is only O(\poly(d)), and we can compute ΠA\Pi A in O(\nnz(A)) time, i.e., input-sparsity time. Our result generalizes the input-sparsity time 2\ell_2 subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for 2\ell_2. These input-sparsity time p\ell_p embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1±ϵ)(1\pm \epsilon)-distortion p\ell_p subspace embedding and relative-error p\ell_p regression. For 2\ell_2, we show that a (1+ϵ)(1+\epsilon)-approximate solution to the 2\ell_2 regression problem specified by the matrix AA and a vector bRnb \in \R^n can be computed in O(\nnz(A) + d^3 \log(d/\epsilon) /\epsilon^2) time; and for p\ell_p, via a subspace-preserving sampling procedure, we show that a (1±ϵ)(1\pm \epsilon)-distortion embedding of \A_p into \R^{O(\poly(d))} can be computed in O(\nnz(A) \cdot \log n) time, and we also show that a (1+ϵ)(1+\epsilon)-approximate solution to the p\ell_p regression problem minxRdAxbp\min_{x \in \R^d} \|A x - b\|_p can be computed in O(\nnz(A) \cdot \log n + \poly(d) \log(1/\epsilon)/\epsilon^2) time. Moreover, we can improve the embedding dimension or equivalently the sample size to O(d3+p/2log(1/ϵ)/ϵ2)O(d^{3+p/2} \log(1/\epsilon) / \epsilon^2) without increasing the complexity.Comment: 22 page

    Phenome wide association study of vitamin D genetic variants in the UK Biobank cohort

    Get PDF
    Introduction Vitamin D status is an important public health issue due to the high prevalence of vitamin D insufficiency and deficiency, especially in high latitude areas. Furthermore, it has been reported to be associated with a number of diseases. In a previous umbrella review of meta-analyses of randomized clinical trials (RCTs) and of observational studies, it was found that plasma/ serum 25-hydroxyvitamin D (25(OH)D) or supplemental vitamin D has been linked to more than 130 unique health outcomes. However, the majority of the studies yielded conflicting results and no association was convincing. Aim and Objectives The aim of my PhD was to comprehensively explore the association between vitamin D and multiple outcomes. The specific objectives were to: 1) update the umbrella review of meta-analysis of observational studies or randomized controlled trials on associations between vitamin D and health outcomes published between 2014 and 2018; 2) conduct a systematic literature review of previous Mendelian Randomization studies on causal associations between vitamin D and all outcomes; 3) conduct a systematic literature review of published phenome wide association studies, summarizing the methods, results and predictors; 4) create a polygenic risk score of vitamin D related genetic variants, weighted by their effect estimates from the most recent genome wide association study; 5) encode phenotype groups based on electronic medical records of participants; 6) study the associations between vitamin D related SNPs and the whole spectrum of health outcomes, defined by electronic medical records utilising the UK Biobank study; 7) explore the causal effect of 25- hydroxyvitamin D level on health outcomes by applying novel instrumental variable methods. Methods First I updated the vitamin D umbrella review published in 2015, by summarizing the evidence from meta-analyses of observational studies and meta-analyses of RCTs published between 2014 and 2018. I also performed a systematic literature review of all previous Mendelian Randomizations studies on the effect of vitamin D on all health outcomes, as well as a systematic review of all published PheWAS studies and the methodology they applied. Then I conducted original data analysis in a large prospective population-based cohort, the UK Biobank, which includes more than 500,000 participants. A 25(OH)D genetic risk score (weighted sum score of 6 serum 25(OH)D-related SNPs: rs3755967, rs12785878, rs10741657, rs17216707, rs10745742 and rs8018720, as identified by the largest genome wide association study of 25(OH)D levels) was constructed to be used as the instrumental variable. I used a phenotyping algorithm to code the electronic medical records (EMR) of UK Biobank participants into 1853 distinct disease categories and I then ran the PheWAS analysis to test the associations between the 25(OH)D genetic risk score and 950 disease outcome groups (i.e. outcomes with more than 200 cases). For phenotypes found to show a statistically significant association with 25(OH)D levels in the PheWAS or phenotypes which were found to be convincing or highly suggestive in previous studies, I developed an extended case definition by incorporating self-reported data collected by UK Biobank baseline questionnaire and interview. The possible causal effect of vitamin D on those outcomes was then explored by the MR two-stage method, inverse variance weighted MR and Egger’s regression, followed by sensitivity analyses. Results In the updated systematic literature review of meta-analyses of observational studies or RCTs, only studies on new outcomes which had not been covered by the previous umbrella review were included. A total of 95 meta-analyses met the inclusion criteria. Among the included studies there were 66 meta-analyses of observational studies, and 29 meta-analyses of RCTs. Eighty-five new outcomes were explored by meta-analyses of observational studies, and 59 new outcomes were covered by meta-analyses of RCTs. In the systematic review of published Mendelian Randomization studies on vitamin D, a total of 29 studies were included. A causal role of 25(OH)D level was supported by MR analysis for the following outcomes: type 2 diabetes, total adiponectin, diastolic blood pressure, risk of hypertension, multiple sclerosis, Alzheimer’s disease, all-cause mortality, cancer mortality, mortality excluding cancer and cardiovascular events, ovarian cancer, HDL-cholesterol, triglycerides and cognitive functions. For the systematic literature review of published PheWAS studies and their methodology, a total of 45 studies were included. The processes for implementing a PheWAS study include the following steps: sample selection, predictor selection, phenotyping, statistical analysis and result interpretation. One of the main challenges is the definitions of the phenotypes (i.e., the method of binning participants into different phenotype groups). In the phenotyping step, an ICD curated phenotyping was widely used by previous PheWAS, which I also used in my own analysis. By applying the ICD curated phenotyping, 1853 phenotype groups were defined in the participants I used. In PheWAS, only phenotype groups with more than 200 cases were analysed (920 phenotypes). In the PheWAS, only associations between rs17216707 (CYP24A1) and “calculus of ureter” (beta = -0.219, se = 0.045, P = 1.14*10-6), “urinary calculus” (beta = -0.129, se = 0.027, P = 1.31*10-6), “alveolar and parietoalveolar pneumonopathy” (beta = 0.418, se = 0.101, P = 3.53*10-5) survived Bonferroni correction. Nine outcomes, including systolic blood pressure, diastolic blood pressure, body mass index, risk of hypertension, type 2 diabetes, ischemic heart disease, depression, non-vertebral fracture and all-cause mortality were explored in MR analyses. The MR analysis had more than 80% power for detecting a true odds ratio of 1.2 or larger for binary outcomes. None of explored outcomes were statistically significant. Results from multiple MR methods and sensitivity analyses were consistent. Discussion Vitamin D and its association with multiple outcomes has been widely studied. More than 230 outcomes have been linked with vitamin D by meta-analyses of observational studies and RCTs. On the contrary, evidence from Mendelian Randomization studies is lacking. In particular I identified only 20 existing MR studies and only 13 outcomes were suggested to be causally related to vitamin D. In the systematic literature review of previous PheWAS studies, I summarized the applied methods, predictors and results. Although phenotyping based on ICD codes provided good performance and was widely applied by previous PheWAS studies, phenotyping can be improved if lab data, imaging data and medical notes can be incorporated. Alternative algorithms, which takes advantage of deep learning and thus enable high precision phenotyping, needs to be developed. From the PheWAS analysis, the score of vitamin D related genetic variants was not statistically significantly associated with any of the 920 phenotypes tested. In the single variant analysis, only rs17216707 (CYP24A1) was shown to be associated with calculus outcomes statistically significantly. Previous studies reported associations between vitamin D and hypercalcemia, hypercalciuria, nephrolithiasis and nephrocalcinosis, may be due to the role of vitamin D in calcium homeostasis. In the MR analysis, I found no evidence of large to moderate (OR>1.2) causal associations of vitamin D on a very wide range of health outcomes. These included SBP, DBP, hypertension, T2D, IHD, BMI, depression, non-vertebral fracture and allcause mortality which have previously been proposed to be influenced by low vitamin D levels. Further, even larger studies, probably involving the joint analysis of data from several large biobanks with future IVs that explain a higher proportion of the trait variance, will be required to exclude smaller causal effects which could have public health importance because of the high population prevalence of low vitamin D levels in some populations

    Numerical Study on Reasonable Entry Layout of Lower Seam in Multi-seam Mining

    Get PDF
    Abstract: According to the geological conditions of 6# coal seam and 8# coal seam in Xieqiao Coal Mine, reasonable entry layout of lower seam in multi-seam mining has been studied by FLAC3D numerical simulation. Three ways of entry layout including alternate internal entry layout, alternate exterior entry layout and overlapping entry layout has been put forward for discussing on reasonable entry layout. Then stress distribution and displacement characteristics of surrounding rock have been analyzed in the three ways of entry layout by numerical simulation, leading to the conclusion that alternate internal entry layout pattern, which make the entry located in stress reduce zone and avoid the influence of abutment pressure of upper coal seam mining to a certain extent, is a better choice for multi-seam mining. The research results herein can offer beneficial reference for entry layout with similar geological conditions in multi-seam minin

    Natural convection heat transfer of a straight-fin heat sink

    Get PDF
    The influence of mounting angle on heat dissipation performance of a heat sink under natural convection condition is investigated in this paper by numerical simulation and experimental test. It is found that the heat sink achieves the highest cooling power when its mounting angle is 90°, while it reaches the lowest when the mounting angle is 15°, which is 6.88% lower than that of 90°. A heat transfer stagnation zone is the main factor that affects the cooling power of the heat sink, and its location and area vary with the mounting angle. It is identified that cutting the heat transfer stagnation zone is an effective way to improve the heat sink performance

    A Distributed Graph Approach for Pre-processing Linked RDF Data Using Supercomputers

    Get PDF
    Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute graph approach to pre-processing linked data. Instead of traversing the memory graph, our system indexes pre-processed join elements that are organized in a graph structure. We analyze the Dbpedia data-set (derived from the Wikipedia corpus) and compare our access method to the graph traversal access approach which we also devise. Results show from our experiments that the distributed, pre-processed graph approach to accessing linked data is faster than the traversal approach over a specific range of linked queries
    corecore